natural_language_understandingfandomcom-20200214-history
Crowdsourcing
Verbosity (Von Ahn et al., 2006)Luis Von Ahn, Mihir Kedia, and Manuel Blum. 2006. Verbosity: a game for col- lecting common-sense facts. In Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 75–78. ACM. was one of the first attempts in gathering annotations with a GWAP. Snow et al. (2008)Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y Ng. 2008. Cheap and fast—but is it good?: evaluating non-expert an- notations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 254–263. Association for Computational Linguistics. described design and evaluation guidelines for five natural language micro-tasks. However, they explicitly chose a set of tasks that could be easily understood by non-expert contributors, thus leaving the recruitment and training issues open. NLP Tasks * coreference resolution: Phrase Detectives (Chamberlain et al., 2008;Jon Chamberlain, Massimo Poesio, and Udo Kruschwitz. 2008. Phrase detec- tives: A web-based collaborative annotation game. Proceedings of I-Semantics, Graz. Chamberlain et al., 2009Jon Chamberlain, Udo Kruschwitz, and Massimo Poesio. 2009. Constructing an anaphorically annotated corpus with non-experts: Assessing the quality of collaborative annotations. In Proceedings of the 2009 Workshop on The Peo- ple’s Web Meets NLP: Collaboratively Constructed Semantic Resources, pages 57–62. Association for Computational Linguistics.), QuizBowl (Gua et al. 2005)Anupam Guha, Mohit Iyyer, Danny Bouman, and Jordan Boyd-Graber. 2015. Removing the training wheels: A coreference dataset that entertains humans and challenges computers. In Proceedings of the 2015 Conference of the North American Chapter ofthe Association for Computational Linguistics: Human Language Technologies. * textual entailment: Negri et al. (2011)Matteo Negri, Luisa Bentivogli, Yashar Mehdad, Danilo Giampiccolo, and Alessan- dro Marchetti. 2011. Divide and conquer: crowd- sourcing the creation of cross-lingual textual entail- ment corpora. In Proceedings of the Conference on Empirical Methods in Natural Language Process- ing, EMNLP ’11, pages 670–679, Stroudsburg, PA, USA. Association for Computational Linguistics. (multilingual) * semantic role labeling: Hong and Baker (2011)Jisup Hong and Collin F Baker. 2011. How good is the crowd at “real” wsd? ACL HLT 2011, page 30., Baker (2012)Collin F Baker. 2012. Framenet, current collaborations and future goals. Language Re- sources and Evaluation, pages 1–18. Annotation models Most tasks are modeled in the same way a classification problem is modeled in machine learning: annotators choose between a set of k'' categories which is the same across questions. Phrase Detectives (Chamberlain et al., 2008;Jon Chamberlain, Massimo Poesio, and Udo Kruschwitz. 2008. Phrase detec- tives: A web-based collaborative annotation game. Proceedings of I-Semantics, Graz.) asks annotators to choose between three choices: non-referring, discourse-new, discourse-old. In the last case, they are asked to further specify the most recent mention belonging to the same entity. Paun et al. (2018)Paun, S., Chamberlain, J., Kruschwitz, U., Yu, J., & Poesio, M. (2018). A Probabilistic Annotation Model for Crowdsourcing Coreference. ''EMNLP 2018, 1926–1937. attempt at modeling this setting in a pair-wise manner. Aggregation methods Majority vote From Paun et al. (2018): "Probabilistic models of annotation, in particular, make it possible to characterize the accuracy of the annotators and correct for their bias (Dawid and Skene, 1979; Passonneau and Carpenter, 2014), to account for item-level effects (e.g.: difficulty) (Whitehill et al., 2009), and to employ different pooling strategies (Carpenter, 2008)" Platforms Amazon's Mechanical Turk Usage: NAACL (2010)NAACL, H. (2010). Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk., Laws et al. (2011)Laws, F., Scheible, C., & Schütze, H. (2011). Active Learning with Amazon Mechanical Turk. Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1546–1556. Motivation of workers: Antin and Shaw (2012)Antin, J., & Shaw, A. (2012). Social desirability bias and self-reports of motivation. Proceedings of the 2012 ACM Annual Conference on Human Factors in Computing Systems - CHI’12, 2925. http://doi.org/10.1145/2207676.2208699 found that, although monetary reward is the most important motivation drawing workers to the website, more than half of the workers also come for "fun". They argue that the results obtained by Ipeirotis (2010)Panos Ipeirotis. 2010. New demographics of Mechanical Turk. http://behind-the-enemy-lines. blogspot.com/2010/03/ new-demographics-of-mechanical-turk. html. is distorted by social desirability. Litman et al. (2014)Litman, L., Robinson, J., & Rosenzweig, C. (2015). The relationship between motivation, monetary compensation, and data quality among US- and India-based workers on Mechanical Turk. Behavior Research Methods, 47(2), 519–528. http://doi.org/10.3758/s13428-014-0483-x argue that money is the most important motivation and "data quality is directly affected by compensation rates for India-based participants". Recommendations: * Increase intrinsic motivation: from Paolacci and Chandler (2014)Paolacci, G., & Chandler, J. (2014). Inside the Turk: Understanding Mechanical Turk as a Participant Pool. Current Directions in Psychological Science, 23(3), 184–188. http://doi.org/10.1177/0963721414531598: "Thanking workers and explaining to them the meaning of the task they will complete can stimulate better work (D. Chandler & Kapelner, 2013)Chandler, D., & Kapelner, A. (2013). Breaking monotony with meaning: Motivation in crowdsourcing markets. Journal of Economic Behavior & Organization, 90, 123–133., as does framing a task as requested by a nonprofit organization (Rogstadius et al., 2011)Rogstadius, J., Kostakos, V., Kittur, A., Smus, B., Laredo, J., & Vukovic, M. (2011, July). An assessment of intrinsic and extrinsic motivation on task performance in crowdsourcing markets. Paper presented at the 5th International AAAI Conference on Weblogs and Social Media, Barcelona, Spain.." * CrowdFlower Usage: He et al. (2016)NAACL, H. (2010). Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk. References Category:Crowdsourcing